Gene Selection Using Logistic Regressions Based on Aic, Bic and Mdl Criteria

نویسندگان

  • XIAOBO ZHOU
  • XIAODONG WANG
چکیده

In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables (gene expressions) and the small number of experimental conditions. Many gene-selection and classification methods have been proposed; however most of these treat gene selection and classification separately, and not under the same model. We propose a Bayesian approach to gene selection using the logistic regression model. The Akaike information criterion (AIC), the Bayesian information criterion (BIC) and the minimum description length (MDL) principle are used in constructing the posterior distribution of the chosen genes. The same logistic regression model is then used for cancer classification. Fast implementation issues for these methods are discussed. The proposed methods are tested on several data sets including those arising from hereditary breast cancer, small round blue-cell tumors, lymphoma, and acute leukemia. The experimental results indicate that the proposed methods show high classification accuracies on these data sets. Some robustness and sensitivity properties of the proposed methods are also discussed. Finally, mixing logistic-regression based gene selection with other classification methods and mixing logistic-regression-based classification with other gene-selection methods are considered.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Geometric BIC

The author introduced the “geometric AIC” and the “geometric MDL” as model selection criteria for geometric fitting problems. These correspond to Akaike’s “AIC” and Rissanen’s “BIC”, respectively, well known in the statistical estimation framework. Another criterion well known is Schwarz’ “BIC”, but its counterpart for geometric fitting has been unknown. This paper introduces the corresponding ...

متن کامل

Model Selection using Information Theory and the MDL Principle ∗

Information theory offers a coherent, intuitive view of model selection. This perspective arises from thinking of a statistical model as a code, an algorithm for compressing data into a sequence of bits. The description length is the length of this code for the data plus the length of a description of the model itself. The length of the code for the data measures the fit of the model to the dat...

متن کامل

Investigation on Several Model Selection Criteria for Determining the Number of Cluster

Abstract Determining the number of clusters is a crucial problem in clustering. Conventionally, selection of the number of clusters was effected via cost function based criteria such as Akaike’s information criterion (AIC), the consistent Akaike’s information criterion (CAIC), the minimum description length (MDL) criterion which formally coincides with the Bayesian inference criterion (BIC). In...

متن کامل

Minimum Description Length Model Selection Criteria for Generalized Linear Models

This paper derives several model selection criteria for generalized linear models (GLMs) following the principle of Minimum Description Length (MDL). We focus our attention on the mixture form of MDL. Normal or normal-inverse gamma distributions are used to construct the mixtures, depending on whether or not we choose to account for possible over-dispersion in the data. For the latter, we use E...

متن کامل

Catching Up Faster by Switching Sooner: A Prequential Solution to the AIC-BIC Dilemma

Bayesian model averaging, model selection and its approximations such as BIC are generally statistically consistent, but sometimes achieve slower rates of convergence than other methods such as AIC and leave-one-out cross-validation. On the other hand, these other methods can be inconsistent. We identify the catch-up phenomenon as a novel explanation for the slow convergence of Bayesian methods...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005